Search CORE

arXiv.org e-Print Archive

LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs

Author: Amann Bernd
Curé Olivier
Naacke Hubert
Randriamalala Tendry
Publication venue
Publication date: 12/10/2015
Field of study

The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various "big data" problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, subClassOf) through time-consuming query rewriting algorithms or space-consuming data materialization solutions. To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank nodes and literals with integer values. In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for efficiently evaluating the main common RDFS entailment rules while minimizing triple materialization and query rewriting. We will show how this encoding can be computed by a scalable parallel algorithm and directly be implemented over the Apache Spark framework. The efficiency of our encoding scheme is emphasized by an evaluation conducted over both synthetic and real world datasets.Comment: 8 pages, 1 figur

Crossref

arXiv.org e-Print Archive

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Author: Amann Bernd
Baazizi Mohamed-Amine
Curé Olivier
Naacke Hubert
Publication venue
Publication date: 08/07/2015
Field of study

Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure

HAL Descartes

INRIA a CCSD electronic archive server

Conflict Ontology Enrichment Based on Triggers

Author: Curé Olivier
Smaïli Kamel
Zakaria Chahnaz
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

International audienceIn this paper, we propose an ontology-based approach that enables to detect the emergence of relational conflicts between persons that cooperate on computer supported projects. In order to detect these conflicts, we analyze, using this ontology, the e-mails exchanged between these people. Our method aims to inform project team leaders of such situation hence to help them in preventing serious disagreement between involved employees. The approach we present builds a domain ontology of relational conflicts in two phases. First we conceptualize the domain by hand, then we enrich the ontology by using the trigger model that enables to find out terms in corpora which correspond to different conflicts

Crossref

HAL Descartes

INRIA a CCSD electronic archive server

Identifying Conflicts Through Emails by Using an Emotion Ontology

Author: Curé Olivier
Smaïli Kamel
Zakaria Chahnaz
Publication venue: HAL CCSD
Publication date: 06/05/2009
Field of study

International audienceIn the logic of text classification, this paper presents an approach to detect emails conflict exchanged between colleagues, who belong to a geographically distributed enterprise. The idea is to inform a team leader of such situation, hence to help him in preventing serious disagreement between team members. This approach uses the vector space model with TF*IDF weight to represent email; and a domain ontology of relational conflicts to determine its categories. Our study also addresses the issue of building ontology, which is made up of two phases. First we conceptualize the domain by hand, then we enrich it by using the triggers model that enables to find out terms in corpora which correspond to different conflicts

HAL Descartes